Text binarization in color documents

نویسندگان

  • Euthimios Badekas
  • Nikos A. Nikolaou
  • Nikos Papamarkos
چکیده

This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or grayscale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. VC 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Font and Background Color Independent Text Binarization

We propose a novel method for binarization of color documents whereby the foreground text is output as black and the background as white regardless of the polarity of foreground-background shades. The method employs an edge-based connected component approach and automatically determines a threshold for each component. It has several advantages over existing binarization methods. Firstly, it can...

متن کامل

An Analysis of Image Binarization Techniques for Natural Scene Images

Text extraction from natural scene images is an emerging field in computer graphics. Extracted text contains important information that can be used for various purpose like vehicle number plate detection to identify the vehicle, to provide information of surrounding to visually impaired persons, preservation of information of historical documents etc. Binarization is a key process in text extra...

متن کامل

Detecting Text in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Color Invariance

In this paper, we propose a novel method for detecting and segmenting text layers in complex images. This method is robust against degradations such as shadows, non-uniform illumination, low-contrast, large signaldependent noise, smear and strain. The proposed method first uses a geodesic transform based on a morphological reconstruction technique to remove dark/light structures connected to th...

متن کامل

AColDPS - Robust and Unsupervised Automatic Color Document Processing System

This paper presents the first fully automatic color analysis system suited for business documents. Our pixelbased approach uses mainly color morphology and does not require any training, manual assistance, prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several...

متن کامل

An Enhancement of Images Using Recursive Adaptive Gamma Correction

The “Adaptive Approach for Historical or Degraded Document Binarization” is that in which Libraries and Museums obtain in large gathering of ancient historical documents printed or handwritten in native languages. Typically, only a small group of people are allowed access to such collection, as the preservation of the material is of great concern. In recent years, libraries have begun to digiti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Imaging Systems and Technology

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2006